Hardware implementation of the secure hash standard

ABSTRACT

An integrated circuit for implementing the secure hash algorithm is provided, According to one aspect of the integrated circuit, the integrated circuit includes a data path and a controller controlling operation of the data path. According to another aspect of the integrated circuit, the data path is capable of handling each round of processing reiteratively. The controller flirter includes an address control module and a finite state machine.

CROSS-REFERENCES TO RELATED APPLICATION(S)

The present application is a continuation-in-part application of U.S. patent application Ser. No. 09/815,122 entitled “ADAPTIVE INTEGRATED CIRCUITRY WITH HETEROGENEOUS AND RECONFIGURABLE MATRICES OF DIVERSE AND ADAPTIVE COMPUTATIONAL UNITS HAVING FIXED, APPLICATION SPECIFIC COMPUTATIONAL ELEMENTS,” filed on Mar. 22, 2001, the disclosure of which is hereby incorporated by reference in their entirety as if set forth in fill herein for all purposes.

BACKGROUND OF THE INVENTION

The present invention generally relates to the secure hash standard. More specifically, the present invention relates to a method and system for implementing a secure hash algorithm (SHA-1) specified by the secure hash standard with hardware resources.

The SHA-1 generally operates as follows. The SHA-1 takes as input a message of maximum length which is less than 2⁶⁴ bits. The message is padded, if necessary, to render the total message length a multiple of 512. The message is then converted into 512-bit blocks. The 512-bit blocks are processed sequentially and the cumulative results represent a 160-bit message digest.

The SHA-1 performs eighty rounds of processing for each 512-bit block. For each of four groups of twenty rounds, the SHA-1 uses one of four Boolean functions and one of four constant values, to be further described below. Once all eighty processing rounds are completed, five 32-bit intermediate variables are updated. The process is then repeated for the next 512-bit block. Once all the 512-bit blocks are processed, the final, cumulative values of the five intermediate variables represent the 160-bit message digest. The details with respect to the processing of the 512-bit blocks will be further described below.

As mentioned above, the SHA-1 converts the message into 512-bit blocks and then processes the 512-bit blocks one at a time. More specifically, each 512-bit block to be processed is divided into sixteen (16) longwords W₀, W₁, . . . , W₁₅, where W₀ is the leftmost longword. Each longword is thirty-two (32) bits in length. The SHA-1 uses a five longword circular buffer to maintain the five 32-bit intermediate variables, a, b, c, d and e.

Prior to processing the first 512-bit block, the intermediate variables are initialized with the constant values H₀ through H₄ (in hex) respectively as follows:

a=H₀=0×67452301

b=H₁=0×EFCDAB89

c=H₂=0×98BADCFE

d=H₃=0×10325476

e=H₄=0×C3D2E1F0

After the intermediate variables are initialized, the processing of the 512-bit blocks takes place as follows:

For t=16 to 79, let W_(t)=S¹(W_(t-3) XOR W_(t-8) XOR W_(t-14) XOR W_(t-16)), where S^(k) ₍ ₎ represents a k-bit circular left shift.

The eighty (80) rounds of processing for each 512-bit block are executed according to the following equations:

For t32 0 to 79 do

a=TEMP=S ⁵(a)+f _(t)(b, c, d)+e+W _(t) +K _(t)

b=a

c=S ³⁰(b)

d=c

e=d

where “+” represents addition modulo 2³².

The function f_(t)(b, c, d) and the constant K_(t) vary during the eighty (80) rounds of processing as follows:

f _(t)(b, c, d)=(b AND c) OR (NOT b AND d), for (t=0 to 19);

f _(t)(b, c, d)=b XOR c XOR d, for (t=20 to 39);

f _(t)(b, c, d)=(b AND c) OR (b AND d) OR (c AND d), for (t=40 to 59);

f _(t)(b, c, d)=b XOR c XOR d, for (t=60 to 79)

K _(t)=2³²×(2^(1/2)/4)=0×A827999 for (t=0 to 19);

K _(t)=2³²×(3^(1/2)/4)=0×6ED9EBA1 for (t=20 to 39);

K _(t)=2³²×(5^(1/2)/4)=0×8F1BBCDC for (t=40 to 59);

K _(t)=2³²×(10^(1/2)/4)=0×CA62C1D6 for (t=60 to 79)

After the eighty (80) rounds of processing (t=0 to 79) are completed, i.e., after a 512-bit block is processed, the intermediate variables a, b, c, d and c are updated as follows:

a=a+H ₀

b=b+H ₁

c=c+H ₂

d=d+H ₃

e=e+H ₄

After processing the last 512-bit block, the message digest is the 160-bit string represented by the five (5) longwords, a, b, c, d and e. The foregoing is a brief description of the SHA-1. Details with respect to the operations of the SHA-1 are well understood.

The SHA-1 is typically implemented using software. A person of ordinary skill in the art will know how to implement the SHA-1 using software. Using software to implement the SHA-1, however, has a number of shortcomings. For example, it is relatively easy to break into a software program designed to implement the SHA-1 thereby revealing that the SHA-1 is used for encrypting messages. By ascertaining the type of encryption algorithm that is being used to encrypt messages, a hacker may then successfully decrypt the message digests to obtain the messages. Hence, it would be desirable to provide a method and system that is capable of offering more secure implementation of the SHA-1.

SUMMARY OF THE INVENTION

According to one exemplary embodiment of the present invention, an integrated circuit for implementing the secure hash algorithm is provided. According to this exemplary embodiment, the integrated circuit includes a data path and a controller controlling operation of the data path. The data path is capable of handling each round of processing reiteratively. In one implementation, the data path includes a data multiplexor, an address multiplexor, a memory, a first processing multiplexor, a second processing multiplexor, a first register, a second register, a shifter and an arithmetic logic unit. By coupling these various components of the data path, as further described below, the data path can be used to execute the secure hash algorithm in a reiterative manner.

In another implementation, the controller includes an address control module and a finite state machine. The address control module further includes a pico code ROM and a number of counters. The address control module uses a pico code memory address, the state of the finite state machine and various counter bits to generate a physical memory address and appropriate control bits to control the operation of the data path.

Reference to the remaining portions of the specification, including the drawings and claims, will realize other features and advantages of the present invention. Further features and advantages of the present invention, as well as the structure and operation of various embodiments of the present invention, are described in detail below with respect to accompanying drawings, like reference numbers indicate identical or functionally similar elements.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a simplified block diagram illustrating an exemplary embodiment of a data path for data processed pursuant to the SHA-1 in accordance with the present invention;

FIG. 2 is a simplified block diagram illustrating an exemplary embodiment of a controller used to control operation of the data path shown in FIG. 1 in accordance with the present invention;

FIG. 3 is an illustrative diagram showing an exemplar embodiment of a data structure used to store data for controlling operation of the controller and the data path in accordance with the present invention;

FIG. 4 is an illustrative diagram showing an exemplary embodiment of a memory map in accordance with the present invention;

FIG. 5 is an illustrative diagram showing an exemplary embodiment of pico code for memory address generation in accordance with the present invention; and

FIGS. 6 a-c are selected illustrative timing diagrams showing operations of the respective components of the data path in accordance with the present invention.

DETAILED DESCRIPTION OF THE INVENTION

The present invention in the form of one or more exemplary embodiments is now described. According to an exemplary embodiment of the present invention, an integrated circuit is provided to implement the Secure Hash Algorithm (SHA-1) specified by the Secure Hash Standard as promulgated by the National Institute of Standards and Technology.

The parallelizability of the SHA-1 allows a continuum of hardware implementations that trade performance and hardware complexity. Assume that performance/throughout is represented by the following equation:

Throughout=(512×f _(max))/(81×m) bits per second

where f_(max) represents the maximum clock frequency, 81 represents 80 processing rounds plus one update round, and m represents the number of clock periods required for each processing round.

In one implementation where m=16 and f_(max)100 MHZ, the resulting performance is calculated to be 39.5 Mb/s or 4.94 MB/s, or approximately five (5) kilobytes per millisecond. Experimentally, it has been determined that the 5 MB/s implementation requires approximately 1500 gates, 128 bytes of RAM and 132 bytes of ROM. In another implementation having an approximate order of magnitude increase in hardware for an m=1 and f_(max)=100 MHz, a performance of 79 MB/s, or 79 kilobytes per millisecond is achieved.

FIG. 1 is a simplified block diagram illustrating the data path of data processed pursuant to the SHA-1 in accordance with the present invention. As shown in FIG. 1, the data path 10 includes a data multiplexor 12, an address multiplexor 14, a memory 16, a first processing multiplexor 18, a first register 20, a second register 22, a shifter 24, a second processing multiplexor 26 and an arithmetic logic unit 28. More specifically, the data multiplexor 12 and the address multiplexor 14 are coupled to the memory 16 to control the output of the memory 16. The output from the memory 16, in turn, is coupled to the first and second processing multiplexors 18, 26. In addition, the first processing multiplexor 18 also receives the output of the arithmetic logic unit 28. The output of the first processing multiplexor 18 is coupled to the first register 20. The output of the first register 20 is coupled to the shifter 24. The output of the shifter 24 is provided to both the arithmetic logic unit 28 and the data multiplexor 12. Furthermore, the output of the arithmetic logic unit 28 is also fed to the second register 22. The output of the second register 22 is coupled to the second processing multiplexor 26. The output of the second processing multiplexor 26 is provided to the arithmetic logic unit 28. It should be noted that the data path 10 does not address issues such as messaging padding, endianness, input/output etc. A person of ordinary skill in the art will be able to address these issues.

In an exemplary embodiment the data path 10 shown in FIG. 1 is controlled by a controller. An exemplary embodiment of the controller is shown in FIG. 2. As shown in FIG. 2, the controller 30 includes a finite state machine 32 and an address control module 34. The finite state machine 32 functions in cooperation with the address control module 34 to control the data path 10. Furthermore, in one exemplary embodiment, the address control module 34 is comprised of a number of components, including a first mod-16 counter 36, a second mod-16 counter 38, a third mod-16 counter 40, a mod-5 counter 42, a ROM 44 and a memory address generator 46. The output of the second mod-16 counter 38 is coupled to the third mod-16 counter 40, the mod-5 counter 42 and the ROM 44. The output of the mod-5 counter 42 is provided to the first mod-16 counter 36. The ROM 44 is coupled to the memory address generator 46. Finally, the respective outputs of the ROM 44 and the memory address generator 46 are provided to the data path 10.

The finite state machine 32 is capable of assuming a number of states. In the exemplary embodiment shown in FIG. 2, the finite state machine 32 can assume one of four (4) different states. The inputs, outputs and respective logic conditions that produce the different states for the finite state machine 32 are shown in FIG. 2.

According to an exemplary embodiment, the data stored within the ROM 44 is organized in a pico code format. FIG. 3 shows an exemplary embodiment of the Pico code format. The data stored within the ROM 44 is used to control operation of the controller and the data path 10. More specifically, the ROM 44 contains a number of pico codes. Each pico code is designed to direct the controller and the data path 10 to perform a specific operation. As shown in FIG. 3, each Pico code has a length of sixteen (16) bits. Bits (0-7) and (13) are used to control the operation of the various components of the data path 10. For example, bits (0) and (1) are respectively used to control the first and second registers 20, 22, bits (2) and (3) are respectively used to control the first and second processing multiplexor 18, 26; bits (4) and (5) are used to control the arithmetic logic unit 28; bits (6) and (7) are used to control the shifter (24); bits (8-12) are used to represent the pico code memory address which is then used to generate the physical memory address for accessing the memory 16; and bit (13) is used to control the type of operation to be performed in the memory 16.

The memory 16 is organized based on a memory map. FIG. 4 shows an exemplary embodiment of the memory map. Referring to FIG. 4, the physical memory address, A[4:0], is five (5) bits in length The use of the 5-bit physical memory address means that there are thirty-two (32) addressable words in the memory 16. Each word is preferably sixteen (16) bits in length. The thirty-two (32) words are used to represent the variables that are needed to carry out the SHA-1. For example, some of the thirty-two (32) available words may be used to represent the sixteen (16) longwords that are used for each of the eighty (80) rounds of SHA-1 processing, the five (5) intermediate variables (a, b, c, d and e), the five (5) initialization values (H₀-H₄), and the four (4) processing constants K_(t=0-19), K_(t=20-39), K_(t=40-59) and K_(t=60-79).)

As mentioned above, the pico code memory address is used to generate the physical memory address for accessing the memory 16. Generally, the physical memory address is generated from the pico code memory address, the state of the finite state machine 32, and various counter bits from the second mod-16 counter 38. FIG. 5 shows an exemplary embodiment of the pico code memory address used for generating the physical memory address to access the memory 16.

The physical memory address, A[4:0], used to access the memory 16 is generated from the pico code memory address in the following manner. When the pico code memory address bits [12-11] are “00”, A[4] is set to “0” and A[3:0] is determined as follows: (constant+t (mod 16)) mod 16, where the constant is:

pico code memory address bits [8] [9] constant 0 0 0x0 0 1 0x8 1 0 0x2 1 1 0xD

When the pico code memory address bits [12-11] are “01”, A[4], A[2] and A[4] are set to “1”. A[3] is set as follows: if [t>=40], then A[3] is set to “1”, else A[3] is set to “0”. A[0] is set as follows: if ([20<=t<=39] OR [t>=60]), then A[0] is set to “1”, else A[0] is set to “0”.

When the pico code memory address bits [12-11] are “10”, A[4] is set to “1” and A[3] is set to “0”. A[2:0] are set as follows using the state of the finite state machine 32 and the pico code memory address bits [10-8]:

if ([FSM_STATE=INIT] OR [FSM_STATE=UPDATE]) then A[2:0] bits [10-8] else

if([bits[10-8]=“101”] AND [t<20]) then A[2:0]=(“001”−t[mod5]) mod 5

else if ([bits[10-8]=“101”] AND [t>=20]) then A[2:0]=(“011”−t[mod5]) mod 5

else if([bits[10-8]=“111”] AND [t<20]) then A[2:0]=(“011”−t[mod5]) mod 5

else if ([bits[10-8]=“111”] AND [t>=20]) then A[2:0]=(“001”−t[mod5]) mod 5

else A[2:0]=(bits[10-8]−t[mod5]) mod 5

When the pico code memory address bits [12-11] are “11” then A[4:0] are set to the pico code memory address bits [12-8].

Operations of the data path 10 are illustrated by a number of selected ting diagrams. FIGS. 6 a-c are selected illustrative timing diagrams showing operations of the respective components of the data path 10. More specifically, FIG. 6 a is a timing diagram illustrating the operation of various components of the data path 10 when initializing the intermediate variables (a, b, c, d and e) with the initialization constants (H₀-H₄); FIG. 6 b is a timing diagram illustrating the operation of various components of the data path 10 for one round (round t=57) of SHA-1 processing; and FIG. 6 c is a timing diagram illustrating the operation of various components of the data path 10 for the intermediate variable update round.

In an exemplary embodiment, the data path 10 and the controller including the finite state machine 32 and the address control module 34 are implemented as part of an integrated circuit using hardware. The integrated circuit can be embedded in a mobile communication device, such as a mobile phone, where encryption and decryption functions are desired for security purposes. Furthermore, the data path 10 and the controller can be implemented using reconfigurable hardware resources within an adaptive computing architecture. Details relating to the adaptive computing architecture and how reconfigurable hardware resources are used to implement functions on an on-demand basis are disclosed in U.S. patent application Ser. No. 09/815,122 entitled “ADAPTIVE INTEGRATED CIRCUITRY WITH HETEROGENEOUS AND RECONFIGURABLE MATRICES OF DIVERSE AND ADAPTIVE COMPUTATIONAL UNITS HAVING FIXED, APPLICATION SPECIFIC COMPUTATIONAL ELEMENTS,” filed on Mar. 22, 2001, the disclosure of which is hereby incorporated by reference in their entirety as if set forth in full herein for all purposes. Based on the disclosure provided herein, it will be appreciated by a person of ordinary skill in the art that the present invention can be implemented using hardware in various different manners.

It should also be understood that based on the disclosure provided herein, it will be appreciated by a person of ordinary skill in the art that minor modifications can be made to the present invention to accommodate and implement a number of other encryption/decryption algorithms.

It is understood that the examples and embodiments described herein are for illustrative purposes only and that various modifications or changes in light thereof will be suggested to persons skilled in the art and are to be included within the spirit and purview of this application and scope of the appended claims. All publications, patents, and patent applications cited herein are hereby incorporated by reference for all purposes in their entirety. 

1. An integrated circuit for implementing a secure hash algorithm, comprising: a data path configured to process an input message pursuant to the secure algorithm; and a controller configured to control operation of the data path; wherein the data path and the controller are implemented using hardware components.
 2. An integrated circuit for implementing the secure hash algorithm, comprising: a data path circuit comprising: a memory configured to store a plurality of variables that are used to carry out the secure hash algorithm; a first multiplexor coupled to the memory; a first register coupled to the first multiplexor; a shifter coupled to the first register; an arithmetic logic unit coupled to the shifter and the first multiplexor; a second register coupled to the arithmetic logic unit; and a second multiplexor coupled to the second register, the memory and the arithmetic logic unit; and a controller configured to control operation of the data path circuit, comprising: an address control module; and a finite state machine operable in conjunction with the address control module to generate a physical memory address for accessing the memory and a plurality of control bits, the physical memory address and the plurality of control bits are used to control operation of the data path circuit. 